Embedded machine learning systems for natural language processing: a general framework
نویسنده
چکیده
This paper presents Kenmore, a general framework for knowledge acquisition for natural language processing (NLP) systems. To ease the acquisition of knowledge in new domains, Kenmore exploits an on-line corpus using robust sentence analysis and embedded symbolic machine learning techniques while requiring only minimal human intervention. By treating all problems in ambiguity resolution as classiication tasks, the framework uniformly addresses a range of subproblems in sentence analysis, each of which traditionally had required a separate computational mechanism. In a series of experiments, we demonstrate the successful use of Kenmore for learning solutions to several problems in lexical and structural ambiguity resolution. We argue that the learning and knowledge acquisition components should be embedded components of the NLP system in that (1) learning should take place within the larger natural language understanding system as it processes text, and (2) the learning components should be evaluated in the context of practical language-processing tasks.
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملTransparent Machine Learning Algorithm Offers Useful Prediction Method for Natural Gas Density
Machine-learning algorithms aid predictions for complex systems with multiple influencing variables. However, many neural-network related algorithms behave as black boxes in terms of revealing how the prediction of each data record is performed. This drawback limits their ability to provide detailed insights concerning the workings of the underlying system, or to relate predictions to specific ...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملMachine Learning of Natural Language
In this article we provide an overview of recent research on the application of symbolic Machine Learning techniques to language data (Machine Learning of Natural Language, MLNL). Both in Quantitative Linguistics (QL) and in MLNL, the main goal is to describe the language as it is observed with rules, language models, or other descriptions. But whereas the motivation in QL is purely scientific ...
متن کاملNatural Language Processing of Textual Requirements
Natural language processing (NLP) is the application of automated parsing and machine learning techniques to analyze standard text. Applications of NLP to requirements engineering include extraction of ontologies from a requirements specification, and use of NLP to verify the consistency and/or completion of a requirements specification. This work-in-progress paper describes a new approach to t...
متن کامل